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A  “brainlike” paradigm  offers  computing  abilities  not  available  to  standard 
algorithmic  techniques,  and  shows  potential  for  building  high-quality  speech 

recognizers. 


Stephen  A.  Luse 

Senior  Engineer 
Naval  Ocean  Systems  Center'’.  ■ 

San  Diego,  CA  I 

'/ 

This  is  a  general  introduction  to  the  re- 
emerging  technology  called  neural 
networks,  and  how  these  networks  may 
provide  an  important  alternative  to  tradi¬ 
tional  forms  of  computing  in  speech  appli¬ 
cations.  Neural  networks,  sometimes 
called  Artificial  Neural  Systems  (ANS),  have 
shown  promise  for  solving  problems  that 
traditional  algorithmic  and  AI  (artificial  in¬ 
telligence)  approaches  have  found  diffi¬ 
cult.  The  world's  greatest  super  computer 
calculates  Pi  to  thousands  of  decimal 
places  in  seconds  using  algorithmic  tech¬ 
niques,  but  it  may  not  ever  be  able  to  rec¬ 
ognize  a  smiling  human  face  when  only  a 
non-smiling  version  of  this  face  is  available 
for  comparison! 

One  reason  for  this  is  that  computers 
process  information  serially,  and  an  in¬ 
credibly  large  number  of  serial  steps  are 
required  to  perform  such  a  task.  There¬ 
fore,  even  with  the  fastest  computer,  devel¬ 
oping  algorithms  that  can  ignore  unim¬ 
portant  differences  in  images  and 
match-stored  patterns  with  acceptable 
time  delays  is  not  an  easy  feat.  The  brain, 
on  the  other  hand,  processes  information 
in  a  parallel  fashion,  distributing  informa¬ 
tion  and  processing  tasks  throughout 
many  neurons  and  their  interconnections. 
ANS  processors  mimic  this  parallel  struc¬ 
ture  and  are  able  to  outperform  serial  proc¬ 
essors  for  certain  tasks.  They  can  also 


learn  from  their  environment  and  are 
highly  tolerant  of  internal  failures. 

Text- to- Speech  Conversion:  A 

Mapping  Problem _ 

Neural  networks  are  paradigms,  or 
models,  that  “compute"  in  a  brainlike  fash¬ 
ion.  While  neural  networks  do  not  emulate 
the  brain's  full  complexity,  their  structure 
is  "brain-inspired"  in  that  they  consist  of 
groups  of  interconnected  "neurons" 
called  processing  elements.  In  practice, 
these  processing  elements  are  intercon¬ 
nected  in  different  ways  to  form  different 
processing  architectures  with  different 
functionalities.  For  example,  some  net¬ 
works  can  perform  associative  memory 
functions  that  map  arbitrary  input  and  out¬ 
put  pairs  while  others  can  group  inputs 
into  classes,  or  even  self-organize  and  per¬ 
form  feature  extraction  on  data  presented 
to  them.  Other  architectures  reconstruct 
full  sets  of  data  when  given  only  parts  or 
corrupted  versions  of  what  is  stored  (con¬ 
tent  addressable  memory  functions).  In 
other  words,  they  take  an  incomplete  input 
and  supply  the  missing  parts  Still  other 
networks  have  the  ability  to  generalize  and 
combine  knowledge  in  a  way  that  allows 
extraction  of  consensus  and  off-setting  of 
differing  opinions  among  experts  in  a  spe¬ 
cific  domain.  More  complex  networks  can 
model  the  concepts  of  memory  and  motor 
.ontrol  in  biological  systems  to  show  pos¬ 
sible  mechanisms  by  which  organisms 
learn  to  interact  with  an  environment. 

With  all  of  this  functionality,  neural  net¬ 
works  seem  to  have  at  least  some  potential 
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for  building  high-quality  speech  recogniz¬ 
ers  with  superior  capabilities  for  speaker 
independence  and  continuous  speech. 
Current  speech-recognition  implementa¬ 
tions  suffer  from  large  data  storage  require¬ 
ments  and  relatively  long  processing  times, 
due  to  the  algorithmic  processes  involved. 
These  requirements  make  real-time 
speech-recognition  systems  difficult  to 
achieve.  As  ANS  technology  and  applica¬ 
tions  become  refined  and  true  neural  net¬ 
work  computers  become  available,  we 
may  see  neural  networks  solving  problems 
related  to  syntax  and  knowledge  extrac¬ 
tion  of  recognized  speech  without  the  lim¬ 
itations  imposed  by  algorithmic 
approaches. 

An  experiment  performed  by  Terrence 
Scjnowski  and  Charles  Rosenberg,  at  Johns 
Hopkins  University,  shows  the  potential  for 
neural  network  applications  in  the  speech 
field.  They  used  a  neural  network  called 
the  back-propagation  model  (see  sidebar 
The  Back-Propagation  Network)  to  ad¬ 
dress  the  problem  of  synthesizing  speech 
from  text.  They  call  their  netw-ork 
NETtalk. 

Scjnowski  and  Rosenberg  realized  that 
text-to-speech  conversion  is  a  mapping 
problem  and  assumed  that  some  function 
exists  that  can  perform  the  necessary  map¬ 
ping  The  abilities  of  the  back-propagation  i 
network  made  it  unnecessary  to  know  the  , 
form  of  this  mapping  function.  The  back- 
propagation  network  has  the  ability  to  ! 
compare  a  set  of  inputs  with  desired  out¬ 
puts  and  generalize  the  implied  relation¬ 
ship.  developing  us  own  internal  represen¬ 
tation  of  the  function. 


3 


68  3  21 


06  ? 


1  t‘l  m"\  >♦»  »*« 


WTr-nru  wyj 


TCCHNOUXZY 


Fig.  1 


/  k  / 

26  Output  Units  CCmXXXXXXXXO 


60  Hidden  Units 


203  Input  Units  OOO  -  O  000  -  O  000  -  O  000  -  O  COO  -  O  COO  -  O  000  -  O 
<7  groups  of  29) 

(-  Q  _  C  Q  t  _  ) 


The  Architecture _ 

NETtalk's  architecture  consists  of  203 
input  units,  SO  hidden  units  and  26  output 
units  (see  Fig.  1 ).  The  input  layer  is  a  cluster 
of  seven  groups.  Each  processing  element 
in  each  group  encodes  one  letter  of  input 
text  (A  through  Z,  two  punctuation  charac¬ 
ters  and  a  space  character  used  to  separate 
word  boundaries).  Seven  groups  allow  the 
network  to  view  a  window  of  seven  char¬ 
acters  at  one  time.  The  desired  output  from 
the  network  is  the  correct  phoneme  asso¬ 
ciated  with  the  center,  or  fourth,  letter  of 
this  window.  The  three  letters  on  either 
side  of  the  center  letter  provide  partial  con¬ 
text  for  determination  of  the  proper 
phoneme. 

The  output  units  of  the  network  encode 
phonemic  information  represented  by 
twenty-one  articulatory  features  such  as 
point  of  articulation,  voicing,  vowel 
height,  and  so  on.  Five  additional  units  en¬ 
code  stress  and  syllable  boundaries  for  a 
total  of  26  output  units.  These  output  fea¬ 
tures  drive  a  DEC  talk  phoneme  synthesizer 
bypassing  its  internal  text  to-speech 
algorithm. 


The  Network  Makes  Rules _ 

As  a  training  set,  1024  words  of  tran¬ 
scribed  natural  speech  from  a  six-year-old 
child  were  used.  Each  input  word  was  as¬ 
signed  a  set  of  output  phonemes  and 


NETtalk  Architecture 

stresses  that  made  the  word  sound  natural 
when  played  through  DECtalk.  The  net¬ 
work  learned  the  relationship  between  the 
input  words  and  output  phonemes  by  step¬ 
ping  each  word  through  the  seven - 
character  window,  shifting  one  character 
at  a  time.  After  1 0  passes  through  the  train¬ 
ing  set,  the  synthesized  text  was  under¬ 
standable,  and  after  50  passes,  the  error 
rate  was  less  than  5  % . 

To  determine  if  the  network  merely 
memorized  the  training  words  or  whether 
it  actually  determined  its  own  internal 
rules  for  pronunciation.  Sejnowski  pre¬ 
sented  the  network  with  439  different 
words.  The  performance  on  these  words 
was  78%,  which  showed  that  the  network 
did  in  fact  make  generalized  rules  with 
considerable  success  The  network  per 
formed  quite  well  on  novel  (untrained) 
words  and  accurately  transformed  them  to 
appropriate  phonemic  representations 

Enhancements  to  this  experiment  have 
demonstrated  interesting  properties  of  the 
back-propagation  network.  Plotting  learn¬ 
ing  curves  of  the  network  as  error  rates. 
Sejnowski  noticed  that  learning  follows  a 
power  law  which  is  characteristic  of  hu¬ 
man  learning  The  network  also  exhibits 
resistance  to  damage  Random  perturba¬ 
tions  of  the  network  connection  weights 
had  little  effect  on  the  performance  of  a 
highly  trained  network,  and  retraining  of 
the  damaged  network  occurred  much 
quicker  than  original  training. 


Potential  and  Applications  of  the 

Concept _ 

Sejnowski  s  experimental  synthesizer  is 
far  from  becoming  a  commercial  product 
because  the  neural  network  is  simulated 
on  a  digital  computer  and  cannot  run  in 
real  time,  but  he  has  shown  through  this 
experiment  that  neural  networks  provide 
hope  for  solving  complex  problems  in 
man-weeks  or  months  rather  than  man 
years,  as  required  by  traditional  al¬ 
gorithmic  approaches. 

Other  researchers  have  also  shown  the 
potential  power  of  the  neural  network 
concept  Dr.  Jeff  Elman.  University  of  Cali¬ 
fornia  at  San  Diego,  is  using  neural  net 
works  as  a  tool  for  studying  speech  per¬ 
ception  Using  the  network  s  ability  to 
internally  organize  information,  he  is  gain¬ 
ing  an  understanding  of  how  humans  mav 
organize  and  classify  speech  information 

Ttu'.’o  Kohonen.  Helsinki  University  of 
Technology  is  developing  self-organizing 
and  content-addressable  memory  net¬ 
works  Using  these  networks,  he  has  devel¬ 
oped  a  prototype  speech  recognizer  that 
sclf-organizes  phonemes  into  j  two- 
dimensional  map  (This  map  is  interesting 
because  the  network  groups  phonemes  ac 
cording  to  high-level  and  tow-level  fea¬ 
tures  much  as  humans  have  done  in 
speech  analysis.)  He  then  p'oeesses  spo¬ 
ken  speech  so  that  a  map  of  a  word  is 
drawn  on  this  two-dimensional  phoneme 
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Fig.  2 
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The  Processing  Element 


inputs  are  denoted  by  x' s.  while  corresponding 
Interconnect  strengths  are  denoted  by  w's.  The 
output  o I  the  PE  is  defined  by  s 

output  -  f(  ^  Wi*i  ) 

N 

where  f(  ^  WjJf,- )  is  the  activation  function, 
i 


map.  A  word  spoken  multiple  times  by  the 
same  speaker  will  traverse  the  map  in  a 
consistent  manner  due  to  the  network’s 
power  to  generalize.  Another  network 
uses  the  word  map  as  input  and  is  able  to 
match  a  particular  map  to  the  proper  cor¬ 
responding  word. 

Neural  networks  are  showing  promise  in 
other  fields  as  well.  A  demonstration  was 
provided  at  the  International  Conference 
on  Neural  Networks  that  took  place  re¬ 
cently  in  San  Diego.  Using  a  personal  com¬ 
puter,  a  TV  camera,  and  a  neural  network 
plug-in  coprocessor  board,  the  operator 
could  store  images  of  many  faces  in  the 
network.  A  person  could  then  disguise 
himself  with  wigs,  glasses,  mustaches,  etc. 
A  new  image  is  taken  and  input  to  the  net¬ 
work.  The  network  recalls  the  original  im¬ 
age  of  the  person  with  enough  accuracy  to 
demonstrate  the  use  of  neural  networks  as 
a  recognition  technique. 

Another  demonstration  showed  a  char¬ 
acter  recognizer  that  was  reasonable  at  de- 


A  Brtef  History  of  Noural  Networks 


■  The  theory  of  neural  networks  has  evolved  from  bio-physiological  and 
neuroscience  theories  of  how  neurons  process  information  in  the  brain. 
Early  studies  by  McCulloch  and  Pitts  dunng  the  forties  led  to  the  devel¬ 
opment  of  the  first  neural  network  model  in  1943  This  model  showed 
that  a  neural  network  could  compute  and  process  information  in  a 
manner  different  for  current  standard  computing  devices.  However,  the 
model  was  extremely  limited. 

In  1 9  49,  Hebb  introduced  the  concept  of  learning  to  neural  networks. 
From  observations  of  how  biological  neurons  learn,  he  developed  the 
famous  Hebbian  Learning  Rule  which  says  that  the  weight  of  a  synaptic 
connection  is  modified  in  proportion  to  the  difference  between  the  target 
and  actual  output  of  a  neuron.  This  modification  always  tends  to  move 
the  neuron  (processing  element)  output  in  a  direction  closer  to  the 
desired  output  values. 

Lashley's  studies  of  the  mind  (1950)  led  to  a  formalization  that  the 
mind  has  a  distributed  knowledge  representation  That  is,  an  idea  or 
concept  is  not  stored  locally,  but  is  stored  in  a  distributed  fashion 
throughout  many  cells.  This  added  reinforcement  to  the  original  ideas 
established  by  McCulloch  and  Pitts. 

The  field  of  neural  systems  mita/ly  became  popular  in  1957  when 
Rosenblatt  developed  a  neural  model  which  he  called  the  Perceptron. 
The  Perceptron  proved  many  useful  concepts  and  was  the  first  model 
developed  that  had  the  ability  to  learn  patterns  and  make 
generalizations. 

Interest  in  the  field  subsided  when  Minsky  and  Papert  published  their 
book  Perceptions .  An  Introduction  to  Computational  Geometry  In  this 
book  they  demonstrated  Perceptron  computational  limitations  by  show¬ 
ing  it  couldn't  perform  basic  functions  as  simple  as  an  exclusive-or 
(XOR). 

Even  though  neural  network  research  was  slowed  as  a  result  of 
Minsky  and  Papert  s  book,  it  still  continued  in  several  areas  through  the 
1960s.  WHshaw  extended  the  mathematical  analysis  of  distributed 
memory  models  and  found  properties  associated  with  various  model¬ 
ing  schemes.  Stephen  Grossberg  used  neural  models  to  explain  his 


concepts  Of  memory  and  motor  control  His  theories  are  providing 
powerful  insights  into  the  workings  of  the  mind  and  laying  a  foundation 
for  an  overall  mathematical  theory  for  neural  network  operation  and 
design. 

In  the  early  70s  Amari  made  contributions  in  Boolean  network  theory 
while  Anderson  developed  the  concept  of  completely  distributed  linear 
associative  memones.  Kohonen  successfully  explained  the  concept  of 
self-organizing  associative  memory,  which  organizes  and  stores  infor¬ 
mation  similar  to  mechanisms  expected  to  be  operating  m  the  brain 

In  1 977,  the  field  of  Cognitive  Psychology  began  using  the  concept  of 
neural  networks  as  cognitive  and  learning  models  for  speech  under¬ 
standing.  This  movement  was  initiated  by  Rumelhart.  at  the  University  of 
California  at  San  Diego,  and  by  McClelland,  at  Carnegie  Mellon  Univer¬ 
sity,  whose  work  was  inspired  by  the  HEARSAY  Speech  Understanding 
System  at  Stanford  University  They  are  responsible  for  the  term  paral¬ 
lel  distributed  processing,"  or  PDP  models,  as  cognitive  neural  net¬ 
works  are  often  called  today 

About  this  same  time  commercial  applications  research  was  con¬ 
ducted  by  Robert  Hecht-Nielsen  at  TRW  While  working  at  TRW  he 
developed  the  commercially  available  Mark  III  and  Mark  IV  neurocom¬ 
puters  that  model  neural  networks  and  run  simulations  quickly 

The  recent  resurgence  of  interest  in  ANS  technology  is  mostly  attrib¬ 
uted  to  biophysicist  John  Hopfieid.  California  Institute  of  Technology  in 
1982  he  published  a  oaper  which  presented  the  subiect  m  the  broader 
context  of  classical  mechanics  and  statistical  physics  as  wen  as  electrical 
engineering  and  information  processing  This  hignfighted  neural  net¬ 
work  potential  to  researchers  m  many  fields. 

Since  that  time  many  interesting  developments  and  contributions 
have  occurred  One  of  these  came  from  Bart  Kosko  of  Verac  Corpora¬ 
tion.  He  integrated  the  concepts  of  fuzzy  logic  and  fuzzy  data  with  neural 
networks  through  fuzzy  cognitive  maps  and  mapping  networks  that  he 
developed  These  techniques  and  networks  allowed  the  individual  in¬ 
puts  of  many  experts  m  a  specific  field  to  be  combined  and  applied 
without  constraining  the  experts  to  form  a  consensus  of  opinion  This 
work  shows  promise  m  applications  that  are  now  being  done  by  Exoert 
Systems. 
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termini ng  which  digit,  zero  through  nine, 
any  user  writes  on  a  digitizing  pad.  The 
character  recognizer  works  with  a  degree 
of  accuracy  high  enough  to  demonstrate 
potential,  but  what  makes  this  simple  dem¬ 
onstration  interesting  is  that  it  was  devel¬ 
oped  with  less  than  three  man-weeks  of 
work. 

Attendees  at  the  IEEE  first  annual  Inter¬ 
national  Conference  on  Neural  Networks 
(1CNN)  demonstrated  a  high  degree  of  in¬ 
terest  in  the  field  as  evidenced  by  the  larger 
than  expected  attendance.  Talks  and  tuto¬ 
rials  at  the  conference  showed  that  neural 
networks  provide  powerful  concepts  for 
solutions  to  problems  of  pattern  recogni¬ 
tion,  classification,  image  processing, 
speech  processing  and  knowledge  proc¬ 
essing.  Although  demonstrations  at  the 
conference  showed  potential  for  neural 
network  applications  in  the  near  future, 
they  were  only  simulations  on  standard 
computers  limiting  their  overall 
usefulness. 

Current  real-time  applications  will  ini- 
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Processing  Elements  end 
Neural  Networks 

■  Neural  Networks  consist  of  discrete  indepen¬ 
dent  units  called  Processing  Elements  (PE) 
which  are  analogous  to  neurons  in  the  brain. 

In  its  simplest  form,  the  PE  has  multiple  inputs 
and  one  output  (see  Fig,  2).  Associated  with 
each  input  is  a  weight  which  determines  how 
much  influence  the  input  has  on  the  PE.  Inputs  to 
a  PE  can  be  from  the  real  worid  or  from  other  PEs 
in  the  network. 

The  forms  of  the  inputs  are  application  depen¬ 
dent.  If  the  inputs  are  from  the  real  world,  they 
can  be  real  numbers  or  in  binary  form  (0. 1);  in 
some  cases  it  is  useful  to  use  bipolar  form  (- 
1 .  + 1).  If  the  inputs  are  from  other  processing 
elements,  they  are  real  numbers  with  magnitude 
less  than  or  equal  to  one. 

A  PE  performs  a  simple  operation  The  output 
from  the  PE  is  the  weighted  sum  of  the  inputs 
passed  through  a  non-linear  activation  function. 
Activation  functions  are  incorporated  into  neural- 


network  models  to  emulate  thresholding  effects 
present  in  biological  processes,  and  are  neces¬ 
sary  for  effectively  training  the  network. 

A  neural  network  is  a  set  of  interconnected 
PEs.  There  are  many  ways  that  PEs  can  be  inter¬ 
connected.  A  number  of  different  interconnec¬ 
tion  architectures  having  different  performance 
characteristics  are  popular  in  the  literature. 
These  have  names  like  Counter-Propagation. 
Back-Propagation,  Adaptive  Resonance.  Hop- 
field  Networks,  Kohonen  Memories,  and  many 
more.  Each  architecture  has  properties  that 
make  it  useful  for  different  applications  Com¬ 
mon  to  all  architectures  is  the  simple  processing 
element  with  vanable  weight  interconnections. 

In  each  architecture.  PE  outputs  are  deter¬ 
mined  by  a  class  of  simple,  monotomcally  in¬ 
creasing  activation  functions  acting  on  the 
weighted  sums  of  their  inputs.  These  output 
functions  range  from  "digital"  threshold  tunc 
tions  to  totally  linear  functions  with  the  most  inter¬ 
esting  networks  having  non-linear  functions 
somewhere  between  these  extremes  (  Fig.  3) 


tially  be  limited  to  small  networks  for  sig¬ 
nal  and  image  processing.  Experts  agree 
that  for  neural  networks  to  conquer  the 


difficult  problems  such  as  independent 
recognition  of  large  vocabularies,  knowl¬ 
edge  representation,  and  cognition,  they 
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The  Back-Propagation  Met  work 
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will  require  tremendous  amounts  of  para)-  1  ■■  '■  " 

lei  processing  that  cannot  be  achieved  on  references 

today's  hardware.  Optical  processing  Amari,  S.,  "Characteristics  of  Randomly 
holds  promise  for  the  future  in  this  area.  Connected  Threshold-Element  Networks  and 
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TIm  Back-Propagation 
Network 

■  The  back-propagation  network  is  popular  be¬ 
cause  it  is  powerful,  fairly  easy  to  implement,  and 
well  understood.  It  has  the  ability  to  learn  map¬ 
pings  by  example,  that  is.  it  will  generalize  input- 
output  pair  relationships. 

Most  back-propagation  models  consist  of 
three  layers  of  processing  elements  (PEs)  (see 
Fig.  4).  The  input  layer  collects  inputs  from  the 
real  world  and  distnbutes  these  to  the  hidden 
layer.  Each  output  from  the  first  layer  forms  an 
input  to  every  PE  in  the  hidden  layer  Outputs 
from  PEs  in  the  hidden  layer  are  distributed  in  the 
same  fashion  to  the  output  layer.  The  outputs 
from  this  layer  form  real-world  outputs 
Inputs  to  the  network  can  be  real  numbers  or 
binary  in  form  (1  's  and  O's)  depending  on  the 
application.  Outputs  can  also  be  real  or 
thresholded  to  binary  form 


terminmg  a  set  of  weights  that  performs  an  opti¬ 
mal  mapping  would  seem  formidable,  but  we 
can  take  advantage  of  learning  algonthms  to 
automate  this  task  We  can  let  the  network  deter¬ 
mine  the  best  weights  by  using  an  automated 
procedure  that  allows  the  network  to  improve 
performance  through  practice. 

The  goal  of  the  learning  procedure  is  to  mini¬ 
mize  the  average  squared  error  between  the 
computed  values  of  the  network's  outputs  with 
the  desired  outputs.  Each  weight  is  modified  in 
proportion  to  the  amount  of  error  that  it  contrib¬ 
utes  to  the  overall  error  for  all  patterns  presented 
Modification  moves  weight  values  m  a  direction 
that  decreases  the  overall  output  error  (a  gradi¬ 
ent  descent  algorithm). 


Since  the  back-propagation  network  has  the 
ability  to  learn  mappings,  it  is  useful  to  discuss 
what  is  meant  by  a  mapping.  A  mapping  is  some 
function  that  converts  an  input  data  set  to  an 
output  data  sat.  For  example,  the  conversion  of 
text-to- speech  is  a  mapping  The  complexity  of 
this  mapping  *  evidenced  by  the  algonthms 
used  in  commercial  speech  synthesizers. 

The  interconnect  weights  to  each  processing 
element  in  the  back-propagation  network  deter¬ 
mine  what  mapping  the  network  performs  De- 


1.  Start  with  a  back- propagation  network  and  a 
randomized  set  ot  starting  weights 

2.  Collect  a  large  set  of  input-output  pairs  that 
are  representative  of  the  mapping  to  be 
performed. 

3.  Present  one  input  to  the  network  and  deter¬ 
mine  the  network's  output. 

4.  Find  the  error  between  this  output  and  the 
desired  output 

5.  "Back  propagate"  this  output  error  to 
weights  of  elements  in  the  oulput  layer  as  well  as 
weights  of  the  hidden  layer  That  is.  modify  the 
weights  so  as  to  decrease  the  error  tor  this  pat¬ 
tern  The  amount  of  change  necessary  for  each 
weight  is  governed  by  the  learning  rules  for  the 
back- propagation  network  which  are  based  on 
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a  gradient  descent  algorithm 

Repeat  steps  3  through  5  of  the  second  input 
in  the  training  set.  Determine  its  associated  error 
and  again  modify  the  weights  Obviously  how 
much  the  weights  are  modified  will  greatly  affect 
how  well  the  network  weighis  can  be  adjusted  fo 
improve  performance  tor  the  second  incut  while 
not  losing  much  performance  tor  the  first  mout 
The  trick  is  to  determine  the  error  and  weight 
change  that  best  accommodates  both  training 
inputs  Such  a  "statistical"  feature  can  be  built 
into  the  learning  algorithm  in  several  ways 

The  presentation  of  inputs  and  "statistical" 
weight  adjustment  continues  throughout  the 
training  set  After  the  complete  training  set  has 
been  presented  the  overall  performance  error 
Of  the  network  is  lower  than  prior  to  training  The 
overall  error  can  tie  decreased  toward  an  as 
ymptotic  minimum  by  repeating  this  training 
process  Indeed.  4  may  be  necessary  to  present 
the  training  set  hundreds  or  even  thousands  of 
times  before  the  overall  error  is  low  encugn  lor  a 
particular  application. 

Once  the  network  is  framed  if  can  map  any 
input  from  the  training  set  to  its  corresponding 
element  m  the  output  set  with  a  relatively  hign 
degree  ot  accuracy  The  network  has  general 
ized  and  learned'  the  mapping  implied  by  the 
input-output  pairs  Note  that  it  is  not  necessary  to 
consider  all  possible  inpul-outpul  pairs  just 
enough  to  statistically  represent  the  relationship 
between  them  Novel  input  data  can  now  be 
applied  to  the  network  and  it  will  respond  with  an 
appropriate  output  cased  on  us  internal,  sell- 
generated  representation  of  the  implied 
mapomg 
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